An Efficient Data Preprocessing Procedure for Support Vector Clustering
نویسندگان
چکیده
This paper presents an efficient data preprocessing procedure for the support of vector clustering (SVC) to reduce the size of a training dataset. Solving the optimization problem and labeling the data points with cluster labels are time-consuming in the SVC training procedure. This makes using SVC to process large datasets inefficient. We proposed a data preprocessing procedure to solve the problem. The procedure contains a shared nearest neighbor (SNN) algorithm, and utilizes the concept of unit vectors for eliminating insignificant data points from the dataset. Computer simulations have been conducted on artificial and benchmark datasets to demonstrate the effectiveness of the proposed method.
منابع مشابه
An Efficient Predictive Model for Probability of Genetic Diseases Transmission Using a Combined Model
In this article, a new combined approach of a decision tree and clustering is presented to predict the transmission of genetic diseases. In this article, the performance of these algorithms is compared for more accurate prediction of disease transmission under the same condition and based on a series of measures like the positive predictive value, negative predictive value, accuracy, sensitivit...
متن کاملAn Intelligence-Based Model for Supplier Selection Integrating Data Envelopment Analysis and Support Vector Machine
The importance of supplier selection is nowadays highlighted more than ever as companies have realized that efficient supplier selection can significantly improve the performance of their supply chain. In this paper, an integrated model that applies Data Envelopment Analysis (DEA) and Support Vector Machine (SVM) is developed to select efficient suppliers based on their predicted efficiency sco...
متن کاملData Mining Methods for Recommender Systems
In this chapter, we give an overview of the main Data Mining techniques used in the context of Recommender Systems. We first describe common preprocessing methods such as sampling or dimensionality reduction. Next, we review the most important classification techniques, including Bayesian Networks and Support Vector Machines. We describe the k-means clustering algorithm and discuss several alte...
متن کاملSupport Vector Clustering for Outlier Detection
In this paper a novel Support vector clustering(SVC) method for outlier detection is proposed. Outlier detection algorithms have application in several tasks such as data mining, data preprocessing, data filter-cleaner, time series analysis and so on. Traditionally outlier detection methods are mostly based on modeling data based on its statistical properties and these approaches are only prefe...
متن کاملDiagnosis of diabetes by using a data mining method based on native data
Background & Aim: Detecting the abnormal performance of diabetes and subsequently getting proper treatment can reduce the mortality associated with the disease. Also, timely diagnosis will result in irreversible complications for the patient. The aim of this study was to determine the status of diabetes mellitus using data mining techniques. Methods: This is an analytical study and its databas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. UCS
دوره 15 شماره
صفحات -
تاریخ انتشار 2009